[SPARK-52709][SQL] Fix parsing of STRUCT<>#55285
[SPARK-52709][SQL] Fix parsing of STRUCT<>#55285yadavay-amzn wants to merge 1 commit intoapache:masterfrom
Conversation
|
Hi @yadavay-amzn, thank you for your contribution! |
495989d to
ef02927
Compare
|
The GA failure seems not related to this change. Could you rebase to |
When the lexer sees STRUCT<>, it tokenizes it as STRUCT + NEQ because NEQ (<>) is matched before LT (<) in the lexer. This increments complex_type_level_counter for STRUCT but never decrements it (no GT token), corrupting the counter for subsequent tokens. As a result, `SELECT CAST(null AS STRUCT<>), 2 >> 1` fails because >> is not recognized as shift-right. The previous fix (apache#51480) modified the NEQ lexer rule but was reverted because it broke ARRAY(col1 <> col2) where <> is the not-equal operator. This fix follows cloud-fan's suggestion to handle it at the parser level. When the parser matches STRUCT followed by NEQ in the dataType rule, it decrements the counter via an inline action. This is safe because the parser has confirmed that NEQ is being used as empty angle brackets in a type context, not as a comparison operator. Closes SPARK-52709
ef02927 to
d8dea9e
Compare
cloud-fan
left a comment
There was a problem hiding this comment.
The fix is correct and well-scoped. The parser-level approach is the right call — it avoids the lexer-level ambiguity that caused the previous revert (PR #51480), and the dataType rule guarantees STRUCT NEQ always means empty angle brackets (never a comparison operator).
Tests cover the key scenarios thoroughly, including the regression guard for ARRAY(1 <> 2).
|
@yadavay-amzn please reply in the JIRA ticket, so that I can assign it to you, thanks! |
|
@cloud-fan Thanks for the review! I just commented on the JIRA ticket |
What changes were proposed in this pull request?
Fix the
STRUCT<>parsing bug by adding a parser-level action to decrementcomplex_type_level_counterwhenSTRUCT NEQis matched in thedataTyperule.Root cause: When the lexer sees
STRUCT<>, it tokenizes it asSTRUCT+NEQ(becauseNEQis defined beforeLT). This increments the counter forSTRUCTbut never decrements it (noGTtoken), corrupting the counter. Subsequent>>then fails to be recognized as shift-right.Previous attempt: PR #51480 was merged then reverted because it modified the
NEQlexer rule, which brokeARRAY(col1 <> col2)where<>is the not-equal operator.This fix: Follows @cloud-fan's suggestion to handle it at the parser level. When the parser matches
STRUCT NEQin thedataTyperule, we knowNEQis being used as empty angle brackets (not as a comparison), so we decrement the counter via an inline action.Why are the changes needed?
SELECT CAST(null AS STRUCT<>), 2 >> 1fails with a syntax error because the corrupted counter prevents>>from being recognized as shift-right.Does this PR introduce any user-facing change?
No. This is a correctness fix for SQL parsing.
How was this patch tested?
Added test in
SparkSqlParserSuitecovering 6 cases:STRUCT<>followed by>>(the original bug)STRUCT<>followed by>>STRUCT<>followed by>>>(unsigned shift right)ARRAY(1 <> 2)still works (the regression case from the reverted PR)MAP<STRING, ARRAY<INT>>still workTest results:
SparkSqlParserSuite: 46/46 passedDataTypeParserSuite: 59/59 passedPlanParserSuite: 80/80 passedWas this patch authored or co-authored using generative AI tooling?
Yes